Clustering Drives Assortativity and Community Structure in Ensembles of Networks 
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Clustering, assortativity, and communities are key features of complex networks. We probe depen- 
dencies between these attributes and find that ensembles with strong clustering display both high 
assortativity by degree and prominent community structure, while ensembles with high assortativ- 
ity are much less biased towards clustering or community structure. Further, clustered networks 
can amplify small homophilic bias for trait assortativity. This marked asymmetry suggests that 
transitivity, rather than homophily, drives the standard nonsocial/social network dichotomy. 

PACS numbers: 89.75.Hc, 05.10.Ln, 89.75.Fb, 64.60.aq 



Networks provide convenient representations for di- 
verse phenomena spanning physical, technological, so- 
cial, biological and informational domains [US]. They 
are often complicated, historically contingent assemblies 
created by nonlinear processes. Just as it is meaningful 
to "explain" features of real networks with simple gen- 
erative mechanisms, it is also informative to ask what 
properties to expect given no other information about a 
network save that it has a certain set of properties. 

In fact, network properties can be markedly interde- 
pendent [5j |6] . We focus on three key features of undi- 
rected networks: (1) the clustering coefficient, C, which 
reflects the tendency of the network to form triangles 
(transitivity) [3 [8]; (2) the assortativity, r, which re- 
flects the tendency of similar nodes to connect to one an- 
other (homophily) [9 j; and (3) the modularity, Q, which 
reflects the tendency of nodes to form tightly intercon- 
nected communities [TO] . 

We show that ensembles of networks constrained by a 
transitive bias to be strongly clustered also become highly 
degree- assort at ive and modular. In contrast, ensembles 
constrained by a homophilic bias to be highly assortative 
show only weak clustering or modularity. Hence, at the 
ensemble level a fundamental asymmetry exists between 
transitivity and homophily. This asymmetry holds un- 
less the distribution of the number of links attached to 
each node (the node's degree) is extremely broad. Fur- 
thermore, a transitive bias can amplify the effect of a 
homophilic bias towards trait (i.e. race, age, education, 
etc.) assortativity [11] in network ensembles. 

High values for the clustering, assortativity, and mod- 
ularity are often observed in real- world social networks, 
while nonsocial networks may have low values [T2] . 
Although extensive social science literature posits ho- 
mophily to be a dominant force in social network forma- 
tion [TTJ[T3] (since social networks are highly assortative), 
our results show that a bias for transitive relationships 
(also called "triadic closure" in sociology literature [T4] ) 



is sufficient to obtain this effect in network ensembles. 
Our work is complementary to that of Newman and Park 
who produce assortativity and clustering characteristic of 
social networks by introducing modularity [T2] . 
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FIG. 1: The relationship between the clustering coefficient, 
C, and the assortativity, r. Gray points represent social net- 
works, black points represent other types of networks. So- 
cial networks: astro phys (scientific collaboration) [15] ; 
condensed matter (scientific collaboration) [15 ; Cyworld (on- 
line social) [16]; dolphins (friendship) [17]; email (communi- 
cation) [18]; HEP (scientific collaboration) [15]; jazz (musical 
collaboration) [19] ; MySpace (online social) [16] ; network sci- 
ence (scientific collaboration) [20] : nioki (online social) |21j ; 
orkut (online social) [16]; PGP (communication network) 22 ; 
pussokram (online dating) [21]. Non-social networks: c. 
elegans (neural) [23]; e. coli (metabolic) [24]; internet (router 
level) [25] : power (connections between power stations) [7]; 
TAP (yeast protein-protein binding) 26 ; word adjacency (in 
English text) 20 ; Y2H (yeast protein-protein binding) |27j . 

To begin, we note a distinct empirical correlation be- 
tween C and r in real networks illustrated in Fig. [I] with 
social networks (generally) in the high C, high r cor- 
ner, and non-social networks (generally) in the low C, 
low r one. The pattern suggests an interdependence be- 
tween the two features that transcends a simple nonso- 
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TABLE I: Important values for the empirical networks 
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cial/social dichotomy. For instance, consider two net- 
works in Fig. [I] TAP is a high C, high r protein-protein 
interaction network, generated by tandem affinity purifi- 
cation experiments [28] : Y2H is a weakly clustered, disas- 
sortative protein-protein interaction network, generated 
using yeast two hybridization [29]. The experimental 
methodology, by itself, can explain the difference, since 
TAP pulls out bound complexes and assigns links to ev- 
ery pair of proteins in the complex while Y2H tests each 
pair of proteins individually for direct binding. Since 
transitivity has a natural origin in the construction of 
the TAP network, it is likely that the observed assorta- 
tivity arises solely as a byproduct of the interrelationship 
between transitivity and assort at ivity rather than any di- 
rect homophilic tendency between proteins. 

Since network properties often depend conspicuously 
on the degree sequence - or the number of links at- 
tached to each node [30] - we consider ensembles of 
networks constrained to have the same fixed degree se- 
quence (FDS). Three real world networks are studied in 
detail: a collaboration network of high energy physicists 
(HEP) [15]; a collaboration network of network scien- 
tists (NetSci) [20 ; and an encrypted communication net- 
work (PGP) [22]. We also examine a randomly generated 
Erdos - Renyi network (ER) [31 . Basic network param- 
eters are given in Table [I] 

We use a rewiring procedure [32] [33] to sample from 
each ensemble. At each step of the procedure two links 
are chosen at random and their endpoints are exchanged, 
unless this would create a double link, in which case the 
step is skipped. This move set preserves the degree of 
each node but otherwise randomizes connections. To 
sample ensembles with specific features, we use a network 
Hamiltonian H(G) [34-37 to define an exponential en- 
semble by assigning a sampling weight P(G) oc e~ H ^ to 
each graph G. Here we consider ensembles where H{G) 
depends on C, r and/or trait assortativity defined be- 
low. Denoting the number of triangles in G by nA, the 
degree of node i by ki, and the number of nodes by TV, 
the clustering coefficient is defined as 



C 



3nA 



\^i=i(ki l)^i 



(i) 



Assortativity by degree is defined as the Pearson corre- 
lation coefficient between the degrees of nodes joined by 
a link [9]: 



(2) 



where L is the number of links in the network and ji and 
ki are the degrees of nodes at each end of link i. 

To get ensembles with specific values of C or r we use 
the following Hamiltonians: 



H C '=P\C'-Ct\, H r ,=p\r'-r t \ 



(3) 



where C is the current clustering coefficient and C t is 
the target value, and similarly for r' . The parameter /3 
controls the strength of bias towards the target. It is a 
transitive bias in He and a homophilic bias in B. r > . 

We employ simulated annealing based on a standard 
Metropolis-Hastings procedure with a rewiring move 
set [38 , 39 . One pair of links in the network G is switched 
to produce a new candidate network G' . A valid move is 
accepted with probability 



P : 



H(G)-H(G') 



(4) 



and rejected with probability 1 — p. If p > 1 the move is 
accepted. Initially, the network is rewired 2 x 10 5 times at 
/3 = to randomize links and avoid strong hysteresis [37] . 
Then fj is increased slowly, rewiring 5 x 10 4 times after 
each increase until C (or r) hits C t (or r t ). The first 
network with C = Ct (r = r t ) is a single sample from 
the ensemble of networks with a fixed degree sequence 
and C = Ct (r = r t ). The whole process then repeats, 
starting with the f3 = quench. 

We also study the influence of transitivity on trait as- 
sortativity, rd, which measures the tendency for nodes to 
connect to others with the same discrete trait (e.g. race, 
gender, etc.) [9]. For this we add a homophilic bias fid 
for links between nodes with the same trait. Defining 
rd oc ess, where ess is the fraction of links in the net- 
work from a node of type 5 to another node of type 5, 
the Hamiltonian becomes 



H d = (3\C-C t \+(3 d J2e65 



(5) 



Choosing different values of Ct and /3d allows one to ex- 
plore how transitivity impacts trait assortativity at the 
ensemble level. 

We examine ensembles constrained to have a particular 
value of r (resp. C) and measure the value for the other 
feature C (resp. r) averaged over 100 samples from the 
ensemble. Results are shown in Fig. [2] The grey (resp. 
black) symbols show the values for ensembles with con- 
strained r (resp. C). Increasing transitivity to increase C 
has a strong influence on r in all cases, whereas increas- 
ing homophily to increase r has relatively little impact 
on C. The asymmetry is strongest for narrow degree 
distributions (e.g. the ER network), and becomes less 
pronounced, but still apparent, as the degree distribu- 
tion broadens. 

The asymmetric relationship between r and C can 
be understood as follows: For nodes to participate in 
as many transitive relationships as possible, their neigh- 
bours must be of similar degree. Hence increasing clus- 
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FIG. 2: Controlling assort at ivity (grey symbols) vs. control- 
ling clustering coefficient (black symbols) for various network 
degree sequences. C is on the x— axis, r on the y— axis. Each 
point represents average values from 100 samples from an en- 
semble with specified r or C values. The dashed lines show 
the values of r and C for the original network. Note the 
asymmetry between the effect of C on r compared to r on C. 

tering also increases r. Increasing r leads to links between 
nodes of similar degree, but these relationships need not 
be transitive. For narrow degree distributions, one could 
divide all nodes of degree k into two groups and only per- 
mit links between the two groups. Assort ativity would 
be maximum, in the absence of any clustering. 

On the other hand, for broad degree distributions (like 
PGP) only a few nodes of high degree exist, but they 
have a large effect on r. Hence for large r, the high- 
est degree nodes are under strong pressure to link, thus 
creating many transitive relationships. Many social net- 
works do not have broad degree distributions. In such 
cases homophily has only a weak influence on C at the 
ensemble level. 

Fig. [2] also indicates the C and r values for the real- 
world networks (dashed lines). Ensembles of networks 
constrained to have the same C as the real network ex- 
hibit far greater r. Hence, social networks are actually 
disassortative relative to the ensemble of networks with 
the same clustering coefficient and degree sequence [40] . 
Indeed, the most likely way to create many triangles is 
to densely interconnect the higher degree nodes so tri- 
angles clump together (as discussed in Ref. [37]). Real 
social networks seem to spread clustering more evenly 
across the network, thus lowering r. For example in sci- 
entific collaboration networks, supervisory relationships 
may decrease the assortativity by creating links between 
lower degree students and higher degree professors. 

We next consider the influence of r and C on modular- 
ity. Many methods for extracting community structure 
exist [4TJ[42]. For definiteness, we use the one proposed 
by Newman and Girvan [10 : Given a partition of the 
network, is the fraction of all edges connecting a node 
in community i to one in community j, and ai = J2j e ij 



FIG. 3: (Color Online) Modularity Q for various ensembles 
of networks with different target values for C (top row) or 
r (bottom row). Clustering has a much larger impact on 
modularity than assortativity does. 

is the fraction of all links within community i. The mod- 
ularity of the network given partition V is defined as: 



(6) 



We use an agglomerative method [43] to approximate the 
best partition and largest Q-p, which we denote Q. 

The top (resp. bottom) panel in Fig. [3] shows the aver- 
age Q in ensembles with constrained C (resp. r). Tran- 
sitivity has a more pronounced effect on modularity than 
does homophily. The modularity achieved for the highly 
clustered ensembles approximates the actual modularity 
for the real networks (HEP, NetSci, and PGP; see Ta- 
ble [I]), unlike assortative ensembles without a transitive 
bias. 

Finally, we consider the effect of transitivity on trait as- 
sortativity, rd- For each of the degree sequences, we cre- 
ate ensembles of networks with different target C values 
and varying homophilic biases fa. Since the actual data 
sets do not contain trait values, we assign each node one 
of three possible traits at random with equal probability. 
For ER, HEP, and NetSci we observe that ensembles with 
larger C enhance relative to ensembles with the same 
homophilic bias but no clustering (C = 0). This is espe- 
cially clear for the narrowest (ER) degree sequence. For 
the PGP network, which has a broad degree distribution, 
clustering appears to compete with the homophilic bias 
(e.g. the curves cross), leading to a more complicated 
scenario. The interdependence between clustering and 
trait assortativity thus appears to depend on the degree 
sequence, but for narrow degree sequences the positive 
relationship holds and transitivity enhances the effect of 
homophilic bias . We also note that increasing the trait 
assortativity of an ensemble had no impact on C, r, or 
Q (data not shown). 

We conjecture that the standard nonsocial/social (dis- 
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FIG. 4: (Color Online) Trait assortativity ra (y-axis) for en- 
sembles of networks with varying C (indicated in the legend) 
and homophilic bias /3d (x-axis). For narrow degree distribu- 
tions clustering amplifies the response of trait assortativity on 
homophilic bias. For broad degree distribution the opposite 
occurs for small /3d. 



assortative/assortative) dichotomy is driven by transitive 
relationships in many social networks, such as in scien- 
tific collaborations. As shown here, transitivity typically 
leads to assortativity. This explains the anomalous posi- 
tion of TAP located within social networks, and is consis- 
tent with another anomaly in Fig. [T| several online social 
networks show low clustering and low assortativity [44] . 
If assortative mixing by degree is the result of homophily 
by degree in social networks, this anomaly is hard to ex- 
plain: why should popular people stop seeking each other 
out simply because the social network moved online? But 
if assortativity is a side-effect of transitivity, this effect 
is easier to understand: it is plausible that online so- 
cial relationships are less transitive, since in the absence 
of spatially mediated interactions there is a smaller ten- 
dency to introduce mutual friends. We have not ruled 
out the scenario in reference [12,. Indeed, the causal 
factors driving network evolution are likely to be com- 
plex, multifaceted, and idiosyncratic. Our results on the 
asymmetric dependencies between clustering, assortativ- 
ity, and modularity provide a warning about inferring 
causality from naive observations of network structure. 
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